Let’s start with a view of the data
## ListingKey ListingNumber
## 17A93590655669644DB4C06: 6 Min. : 4
## 349D3587495831350F0F648: 4 1st Qu.: 400919
## 47C1359638497431975670B: 4 Median : 600554
## 8474358854651984137201C: 4 Mean : 627886
## DE8535960513435199406CE: 4 3rd Qu.: 892634
## 04C13599434217079754AEE: 3 Max. :1255725
## (Other) :113912
## ListingCreationDate CreditGrade Term
## 2013-10-02 17:20:16.550000000: 6 :84984 Min. :12.00
## 2013-08-28 20:31:41.107000000: 4 C : 5649 1st Qu.:36.00
## 2013-09-08 09:27:44.853000000: 4 D : 5153 Median :36.00
## 2013-12-06 05:43:13.830000000: 4 B : 4389 Mean :40.83
## 2013-12-06 11:44:58.283000000: 4 AA : 3509 3rd Qu.:36.00
## 2013-08-21 07:25:22.360000000: 3 HR : 3508 Max. :60.00
## (Other) :113912 (Other): 6745
## LoanStatus ClosedDate
## Current :56576 :58848
## Completed :38074 2014-03-04 00:00:00: 105
## Chargedoff :11992 2014-02-19 00:00:00: 100
## Defaulted : 5018 2014-02-11 00:00:00: 92
## Past Due (1-15 days) : 806 2012-10-30 00:00:00: 81
## Past Due (31-60 days): 363 2013-02-26 00:00:00: 78
## (Other) : 1108 (Other) :54633
## BorrowerAPR BorrowerRate LenderYield
## Min. :0.00653 Min. :0.0000 Min. :-0.0100
## 1st Qu.:0.15629 1st Qu.:0.1340 1st Qu.: 0.1242
## Median :0.20976 Median :0.1840 Median : 0.1730
## Mean :0.21883 Mean :0.1928 Mean : 0.1827
## 3rd Qu.:0.28381 3rd Qu.:0.2500 3rd Qu.: 0.2400
## Max. :0.51229 Max. :0.4975 Max. : 0.4925
## NA's :25
## EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## Min. :-0.183 Min. :0.005 Min. :-0.183
## 1st Qu.: 0.116 1st Qu.:0.042 1st Qu.: 0.074
## Median : 0.162 Median :0.072 Median : 0.092
## Mean : 0.169 Mean :0.080 Mean : 0.096
## 3rd Qu.: 0.224 3rd Qu.:0.112 3rd Qu.: 0.117
## Max. : 0.320 Max. :0.366 Max. : 0.284
## NA's :29084 NA's :29084 NA's :29084
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## Min. :1.000 :29084 Min. : 1.00
## 1st Qu.:3.000 C :18345 1st Qu.: 4.00
## Median :4.000 B :15581 Median : 6.00
## Mean :4.072 A :14551 Mean : 5.95
## 3rd Qu.:5.000 D :14274 3rd Qu.: 8.00
## Max. :7.000 E : 9795 Max. :11.00
## NA's :29084 (Other):12307 NA's :29084
## ListingCategory..numeric. BorrowerState
## Min. : 0.000 CA :14717
## 1st Qu.: 1.000 TX : 6842
## Median : 1.000 NY : 6729
## Mean : 2.774 FL : 6720
## 3rd Qu.: 3.000 IL : 5921
## Max. :20.000 : 5515
## (Other):67493
## Occupation EmploymentStatus
## Other :28617 Employed :67322
## Professional :13628 Full-time :26355
## Computer Programmer : 4478 Self-employed: 6134
## Executive : 4311 Not available: 5347
## Teacher : 3759 Other : 3806
## Administrative Assistant: 3688 : 2255
## (Other) :55456 (Other) : 2718
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## Min. : 0.00 False:56459 False:101218
## 1st Qu.: 26.00 True :57478 True : 12719
## Median : 67.00
## Mean : 96.07
## 3rd Qu.:137.00
## Max. :755.00
## NA's :7625
## GroupKey DateCreditPulled
## :100596 2013-12-23 09:38:12: 6
## 783C3371218786870A73D20: 1140 2013-11-21 09:09:41: 4
## 3D4D3366260257624AB272D: 916 2013-12-06 05:43:16: 4
## 6A3B336601725506917317E: 698 2014-01-14 20:17:49: 4
## FEF83377364176536637E50: 611 2014-02-09 12:14:41: 4
## C9643379247860156A00EC0: 342 2013-09-27 22:04:54: 3
## (Other) : 9634 (Other) :113912
## CreditScoreRangeLower CreditScoreRangeUpper
## Min. : 0.0 Min. : 19.0
## 1st Qu.:660.0 1st Qu.:679.0
## Median :680.0 Median :699.0
## Mean :685.6 Mean :704.6
## 3rd Qu.:720.0 3rd Qu.:739.0
## Max. :880.0 Max. :899.0
## NA's :591 NA's :591
## FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
## : 697 Min. : 0.00 Min. : 0.00
## 1993-12-01 00:00:00: 185 1st Qu.: 7.00 1st Qu.: 6.00
## 1994-11-01 00:00:00: 178 Median :10.00 Median : 9.00
## 1995-11-01 00:00:00: 168 Mean :10.32 Mean : 9.26
## 1990-04-01 00:00:00: 161 3rd Qu.:13.00 3rd Qu.:12.00
## 1995-03-01 00:00:00: 159 Max. :59.00 Max. :54.00
## (Other) :112389 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## Min. : 0.000 $25,000-49,999:32192 False: 8669
## 1st Qu.: 0.140 $50,000-74,999:31050 True :105268
## Median : 0.220 $100,000+ :17337
## Mean : 0.276 $75,000-99,999:16916
## 3rd Qu.: 0.320 Not displayed : 7741
## Max. :10.010 $1-24,999 : 7274
## NA's :8554 (Other) : 1427
## StatedMonthlyIncome LoanKey TotalProsperLoans
## Min. : 0 CB1B37030986463208432A1: 6 Min. :0.00
## 1st Qu.: 3200 2DEE3698211017519D7333F: 4 1st Qu.:1.00
## Median : 4667 9F4B37043517554537C364C: 4 Median :1.00
## Mean : 5608 D895370150591392337ED6D: 4 Mean :1.42
## 3rd Qu.: 6825 E6FB37073953690388BC56D: 4 3rd Qu.:2.00
## Max. :1750003 0D8F37036734373301ED419: 3 Max. :8.00
## (Other) :113912 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## Min. : 1000 2014-01-22 00:00:00: 491 Q4 2013:14450
## 1st Qu.: 4000 2013-11-13 00:00:00: 490 Q1 2014:12172
## Median : 6500 2014-02-19 00:00:00: 439 Q3 2013: 9180
## Mean : 8337 2013-10-16 00:00:00: 434 Q2 2013: 7099
## 3rd Qu.:12000 2014-01-28 00:00:00: 339 Q3 2012: 5632
## Max. :35000 2013-09-24 00:00:00: 316 Q2 2012: 5061
## (Other) :111428 (Other):60343
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 63CA34120866140639431C9: 9 Min. : 0.0 Min. : -2.35
## 16083364744933457E57FB9: 8 1st Qu.: 131.6 1st Qu.: 1005.76
## 3A2F3380477699707C81385: 8 Median : 217.7 Median : 2583.83
## 4D9C3403302047712AD0CDD: 8 Mean : 272.5 Mean : 4183.08
## 739C338135235294782AE75: 8 3rd Qu.: 371.6 3rd Qu.: 5548.40
## 7E1733653050264822FAA3D: 8 Max. :2251.5 Max. :40702.39
## (Other) :113888
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
## ListingKey ListingNumber ListingCreationDate
## 1 1021339766868145413AB3B 193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1 1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A 81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A 658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2 909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D 1074836 2013-12-14 08:26:37.093000000
## CreditGrade Term LoanStatus ClosedDate BorrowerAPR BorrowerRate
## 1 C 36 Completed 2009-08-14 00:00:00 0.16516 0.1580
## 2 36 Current 0.12016 0.0920
## 3 HR 36 Completed 2009-12-17 00:00:00 0.28269 0.2750
## 4 36 Current 0.12528 0.0974
## 5 36 Current 0.24614 0.2085
## 6 60 Current 0.15425 0.1314
## LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1 0.1380 NA NA NA
## 2 0.0820 0.07960 0.0249 0.05470
## 3 0.2400 NA NA NA
## 4 0.0874 0.08490 0.0249 0.06000
## 5 0.1985 0.18316 0.0925 0.09066
## 6 0.1214 0.11567 0.0449 0.07077
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1 NA NA
## 2 6 A 7
## 3 NA NA
## 4 6 A 9
## 5 3 D 4
## 6 5 B 10
## ListingCategory..numeric. BorrowerState Occupation EmploymentStatus
## 1 0 CO Other Self-employed
## 2 2 CO Professional Employed
## 3 0 GA Other Not available
## 4 16 GA Skilled Labor Employed
## 5 2 MN Executive Employed
## 6 1 NM Professional Employed
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1 2 True True
## 2 44 False False
## 3 NA False True
## 4 113 True False
## 5 44 True False
## 6 82 True False
## GroupKey DateCreditPulled
## 1 2007-08-26 18:41:46.780000000
## 2 2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4 2012-10-22 11:02:32
## 5 2013-09-14 18:38:44
## 6 2013-12-14 08:26:40
## CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1 640 659 2001-10-11 00:00:00
## 2 680 699 1996-03-18 00:00:00
## 3 480 499 2002-07-27 00:00:00
## 4 800 819 1983-02-28 00:00:00
## 5 680 699 2004-02-20 00:00:00
## 6 740 759 1973-03-01 00:00:00
## CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1 5 4 12
## 2 14 14 29
## 3 NA NA 3
## 4 5 5 29
## 5 19 19 49
## 6 21 17 49
## OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1 1 24 3
## 2 13 389 3
## 3 0 0 0
## 4 7 115 0
## 5 6 220 1
## 6 13 1410 0
## TotalInquiries CurrentDelinquencies AmountDelinquent
## 1 3 2 472
## 2 5 0 0
## 3 1 1 NA
## 4 1 4 10056
## 5 9 0 0
## 6 2 0 0
## DelinquenciesLast7Years PublicRecordsLast10Years
## 1 4 0
## 2 0 1
## 3 0 0
## 4 14 0
## 5 0 0
## 6 0 0
## PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1 0 0 0.00
## 2 0 3989 0.21
## 3 NA NA NA
## 4 0 1444 0.04
## 5 0 6193 0.81
## 6 0 62999 0.39
## AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1 1500 11 0.81
## 2 10266 29 1.00
## 3 NA NA NA
## 4 30754 26 0.76
## 5 695 39 0.95
## 6 86509 47 1.00
## TradesOpenedLast6Months DebtToIncomeRatio IncomeRange
## 1 0 0.17 $25,000-49,999
## 2 2 0.18 $50,000-74,999
## 3 NA 0.06 Not displayed
## 4 0 0.15 $25,000-49,999
## 5 2 0.26 $100,000+
## 6 0 0.36 $100,000+
## IncomeVerifiable StatedMonthlyIncome LoanKey
## 1 True 3083.333 E33A3400205839220442E84
## 2 True 6125.000 9E3B37071505919926B1D82
## 3 True 2083.333 6954337960046817851BCB2
## 4 True 2875.000 A0393664465886295619C51
## 5 True 9583.333 A180369302188889200689E
## 6 True 8333.333 C3D63702273952547E79520
## TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 1 11 11
## 6 NA NA NA
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 0 0
## 6 NA NA
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 11000 9947.9
## 6 NA NA
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1 NA 0
## 2 NA 0
## 3 NA 0
## 4 NA 0
## 5 NA 0
## 6 NA 0
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1 NA 78 19141
## 2 NA 0 134815
## 3 NA 86 6466
## 4 NA 16 77296
## 5 NA 6 102670
## 6 NA 3 123257
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1 9425 2007-09-12 00:00:00 Q3 2007
## 2 10000 2014-03-03 00:00:00 Q1 2014
## 3 3001 2007-01-17 00:00:00 Q1 2007
## 4 10000 2012-11-01 00:00:00 Q4 2012
## 5 15000 2013-09-20 00:00:00 Q3 2013
## 6 15000 2013-12-24 00:00:00 Q4 2013
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA 330.43 11396.14
## 2 1D13370546739025387B2F4 318.93 0.00
## 3 5F7033715035555618FA612 123.32 4186.63
## 4 9ADE356069835475068C6D2 321.45 5143.20
## 5 36CE356043264555721F06C 563.97 2819.85
## 6 874A3701157341738DE458F 342.37 679.34
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1 9425.00 1971.14 -133.18
## 2 0.00 0.00 0.00
## 3 3001.00 1185.63 -24.20
## 4 4091.09 1052.11 -108.01
## 5 1563.22 1256.63 -60.27
## 6 351.89 327.45 -25.33
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1 0 1 0
## 2 0 1 0
## 3 0 1 0
## 4 0 1 0
## 5 0 1 0
## 6 0 1 0
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1 0 0 258
## 2 0 0 1
## 3 0 0 41
## 4 0 0 158
## 5 0 0 20
## 6 0 0 1
I’m interested in the overview of statuses of default loans
I’m unclear what the “chargedoff” parameter is. Looking up the definition it appears that this means a loan is severely past due. For that reason I’m going to create a new parameter that combines these charges with default to get an “unpaid” parameter. I“m also going to merge all of the past due parameters into a single column to clear up the clutter.
This new chart more succintly illustrates the various statuses of loans
I’m also curious how the duration of the loans break down via histogram
It looks like there’s only 3 types of loans at play here. 12, 36, and 60 months.
A couple other quick histograms, I see APR and BorrowerRate
I suspect that unpaid or defaulted loans are largely tied income. I want to see defaulted loans as income changes. I’m going to use my own table that aggregated the different types of default statuses
Not very telling. I want to hone in on the SevereDelinquent parameter so let me change the focus. Seems to be a pretty even spread. Perhaps the amount of the payment has an impact
Interestingly it appears that there’s a heavier concentration at lower monthly payments. It is not the expensive payments that lead strongly to delinquency or default.
Chaning direction, I’d like to know if people with higher take home incomes tend to have fewer delinquencies. There are very few instances of delinquencies it seems. I had to manpipulate the value quite a bit. There seems to be (very) slight increase in delinquencies as stated monthly income increases, confirmed via correlation test.
##
## Pearson's product-moment correlation
##
## data: loanData$DelinquenciesLast7Years and loanData$StatedMonthlyIncome
## t = -8.6753, df = 112940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03163235 -0.01997628
## sample estimates:
## cor
## -0.02580519
I’d like to work with credit score but there’s a range. As a simple workaround I’m going to take the average of the upper and lower bounds of the range.
Let’s see the histogram of this data
I want to add in the mean (blue) and median (green) credit scores as vertical lines
I think this average credit score will give me more useful info regarding default probability (this is essentially the same way a loaning agent would gauge risk as far as I can tell)
It’s clear here lower incomes have more defaults and as your credit score moves up your number of defaults drops. Let’s confirm with a linear regression
##
## Call:
## lm(formula = AverageCreditScore ~ DelinquenciesLast7Years + IncomeRange,
## data = loanData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -617.19 -35.08 -2.55 34.23 262.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 699.58224 2.34557 298.257 < 2e-16 ***
## DelinquenciesLast7Years -1.39244 0.01716 -81.146 < 2e-16 ***
## IncomeRange$1-24,999 -13.31695 2.44281 -5.451 5.01e-08 ***
## IncomeRange$100,000+ 25.00270 2.38652 10.477 < 2e-16 ***
## IncomeRange$25,000-49,999 -4.09927 2.36764 -1.731 0.08339 .
## IncomeRange$50,000-74,999 7.36446 2.36843 3.109 0.00187 **
## IncomeRange$75,000-99,999 15.69179 2.38762 6.572 4.98e-11 ***
## IncomeRangeNot displayed -72.88754 2.45125 -29.735 < 2e-16 ***
## IncomeRangeNot employed 7.82547 3.11875 2.509 0.01210 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.3 on 112938 degrees of freedom
## (990 observations deleted due to missingness)
## Multiple R-squared: 0.1781, Adjusted R-squared: 0.178
## F-statistic: 3059 on 8 and 112938 DF, p-value: < 2.2e-16
In most exmaples linear regression seems to indicate a strong correlation.
Plot 1: Modified Categorical Overview of Loan Statuses
Plot 2: Average Credit Scores of population along with Mean (Blue) and Median (Green) Scores from population
Plot3: Number of Delinquencies as credit score increases (further segmented by Income Range)
##
## Call:
## lm(formula = AverageCreditScore ~ DelinquenciesLast7Years + IncomeRange,
## data = loanData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -617.19 -35.08 -2.55 34.23 262.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 699.58224 2.34557 298.257 < 2e-16 ***
## DelinquenciesLast7Years -1.39244 0.01716 -81.146 < 2e-16 ***
## IncomeRange$1-24,999 -13.31695 2.44281 -5.451 5.01e-08 ***
## IncomeRange$100,000+ 25.00270 2.38652 10.477 < 2e-16 ***
## IncomeRange$25,000-49,999 -4.09927 2.36764 -1.731 0.08339 .
## IncomeRange$50,000-74,999 7.36446 2.36843 3.109 0.00187 **
## IncomeRange$75,000-99,999 15.69179 2.38762 6.572 4.98e-11 ***
## IncomeRangeNot displayed -72.88754 2.45125 -29.735 < 2e-16 ***
## IncomeRangeNot employed 7.82547 3.11875 2.509 0.01210 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.3 on 112938 degrees of freedom
## (990 observations deleted due to missingness)
## Multiple R-squared: 0.1781, Adjusted R-squared: 0.178
## F-statistic: 3059 on 8 and 112938 DF, p-value: < 2.2e-16
This was an interesting dataset to work with. I was surprised to find a lack of correlation when it came to a lot of the data, particularly how little of an impact things like income had on default rate. It would be interesting to see additional data in this set such as metadata outlining if the individuals had ever been audited by the IRS, which might prove a positive indicator of potential to default. One of the things I like most about this data is the opportunity to use credit scores as indicators as this is a pretty common use case in the real world.